import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.io as pio
import seaborn as sns
df = pd.read_csv("dataset_2019_2022_new.csv")
df.transaction_date = pd.to_datetime(df.transaction_date)
df['year'] = df.transaction_date.dt.year
df.head(10)
| customer_id | product_id | basket_id | loyalty | household_type | age_band | department | brand | commodity | store | price | transaction_date | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 15803 | 1131974 | 57266 | Loyalist | 1 adult with kids | 19-24 | Grocery | private | Baked bread/buns/rolls | 374 | 0.99 | 2020-05-10 | 2020 |
| 1 | 15803 | 1051516 | 57266 | Loyalist | 1 adult with kids | 19-24 | Produce | national | Vegetables - all others | 374 | 0.70 | 2020-10-24 | 2020 |
| 2 | 15803 | 967254 | 57266 | Loyalist | 1 adult with kids | 19-24 | Pharmaceutical | national | Cold and flu | 374 | 1.68 | 2020-10-18 | 2020 |
| 3 | 15803 | 1134222 | 57266 | Loyalist | 1 adult with kids | 19-24 | Grocery | private | Paper housewares | 374 | 2.59 | 2020-10-23 | 2020 |
| 4 | 15803 | 1003421 | 57266 | Loyalist | 1 adult with kids | 19-24 | Grocery | national | Soup | 374 | 0.60 | 2020-10-27 | 2020 |
| 5 | 15803 | 937791 | 57266 | Loyalist | 1 adult with kids | 19-24 | Grocery | national | Frozen pizza | 374 | 2.25 | 2020-10-28 | 2020 |
| 6 | 15803 | 1082185 | 57266 | Loyalist | 1 adult with kids | 19-24 | Produce | national | Tropical fruit | 374 | 1.07 | 2020-12-10 | 2020 |
| 7 | 15803 | 940947 | 57266 | Loyalist | 1 adult with kids | 19-24 | Meat | national | Heat/serve | 374 | 5.00 | 2020-02-10 | 2020 |
| 8 | 15803 | 1056010 | 57266 | Loyalist | 1 adult with kids | 19-24 | Nutrition | national | Prepared food | 374 | 3.00 | 2020-10-17 | 2020 |
| 9 | 15803 | 845307 | 57266 | Loyalist | 1 adult with kids | 19-24 | Deli | national | Deli meats | 374 | 3.17 | 2020-10-24 | 2020 |
df.groupby(['loyalty']).agg(revenue=('price',sum)).reset_index()
| loyalty | revenue | |
|---|---|---|
| 0 | First Time Buyer | 3524.53 |
| 1 | Loyalist | 102222.87 |
| 2 | Promiscuous | 145233.07 |
lcolor_pie = ['#66B2FF','#9999FF','#6666FF']
tmp = df.groupby(['loyalty']).agg(revenue=('price',sum)).reset_index()
#pie chart
fig = go.Figure(
data=[go.Pie(labels=tmp.loyalty, values=tmp.revenue, marker=dict(colors=lcolor_pie))],
layout = go.Layout(
title = 'Revenue Distribution by Loyalty'
)
)
fig.show()
The pie chart provides a clear picture of the revenue distribution among the loyalty types over a four year timeframe. Promiscuous customers stand out as the primary revenue drivers, followed by Loyalists, while First time Buyers make a fair contribution. These insights underscore the necessity of targeted strategies that focus on maximising revenue potential from Promiscuous customers to Loyalists, while also devising effective conversion tactics to elevate the revenue contribution of First Time Buyers.
Identify the following:
#transactions by year
df.groupby(['year','loyalty']).agg(transactions=('basket_id', pd.Series.nunique)).reset_index()
| year | loyalty | transactions | |
|---|---|---|---|
| 0 | 2019 | First Time Buyer | 18 |
| 1 | 2019 | Loyalist | 363 |
| 2 | 2019 | Promiscuous | 635 |
| 3 | 2020 | First Time Buyer | 19 |
| 4 | 2020 | Loyalist | 409 |
| 5 | 2020 | Promiscuous | 604 |
| 6 | 2021 | First Time Buyer | 12 |
| 7 | 2021 | Loyalist | 469 |
| 8 | 2021 | Promiscuous | 556 |
| 9 | 2022 | First Time Buyer | 3 |
| 10 | 2022 | Loyalist | 172 |
| 11 | 2022 | Promiscuous | 238 |
import plotly.io as pio
data = []
lcolor={'Loyalist': '#9999FF', 'Promiscuous':'#6666FF', 'First Time Buyer':'#66B2FF'}
for d in df.loyalty.unique():
tmp = df[df.loyalty==d].groupby(['year']).agg(transactions=('basket_id', pd.Series.nunique)).reset_index()
data.append(go.Scatter(x=tmp.year, y=tmp.transactions, name = d, line=dict(dash='dashdot'), line_color=lcolor[d]))
fig = go.Figure(
data=data,
layout = go.Layout(
title ='Transaction Frequency by Loyalty',
yaxis=dict(
title='Transactions'
)
)
)
fig.show()
The insights drawn from this line chart explain the patterns in customer transaction activities.
For Promiscuous customers, the line chart depicts a gradual decline in transaction frequency over the four years. In 2019, Promiscuous customers conducted 635 transactions, which dipped to 604 in 2020 and further decreased to 556 in 2021. However, the most significant drop is observed in 2022, where transaction frequency plummeted to a mere 238. This severe decline raises questions about the factors contributing to this drop and warrants a thorough examination of potential reasons, such as changing consumer preferences, external economic influences, or alterations in marketing strategies.
In contrast, Loyalist customers display a more fluctuating trend in transaction frequency. In 2019, Loyalist customers engaged in 363 transactions, which increased to 409 in 2020. The upward trajectory continued in 2021, reaching 469 transactions, indicating a positive growth pattern. However, mirroring the trend seen among Promiscuous customers, there is a substantial drop in transaction frequency for Loyalist customers in 2022, where the count decreased to 172. This sharp decline necessitates a deeper analysis to identify the triggers for this downturn and inform strategic interventions.
First Time buyers consistently exhibit lower transaction frequencies across the entire four-year span. With only 18 transactions in 2019, 19 in 2020, 12 in 2021, and a mere 3 in 2022, the line chart accentuates the challenge in converting First Time buyers into more active participants. This persistent low transaction frequency underscores the need for dedicated efforts to enhance engagement, loyalty, and repeat business from this segment.
The declining transaction frequency among Promiscuous and Loyalist customers in 2022, along with the consistent low engagement of First Time buyers, present strategic opportunities for SuperFoodsMax. Addressing these trends requires an informed approach that integrates customer insights, loyalty enhancement strategies, and potentially re-evaluating marketing tactics to invigorate transaction frequency and foster sustained revenue growth.
tmp = df.groupby(['year','loyalty','basket_id']).agg(revenue=('price', sum)).reset_index()
tmp.groupby(['year','loyalty']).agg(avg_spend=('revenue','mean')).reset_index()
| year | loyalty | avg_spend | |
|---|---|---|---|
| 0 | 2019 | First Time Buyer | 72.888333 |
| 1 | 2019 | Loyalist | 73.969669 |
| 2 | 2019 | Promiscuous | 70.016331 |
| 3 | 2020 | First Time Buyer | 67.875263 |
| 4 | 2020 | Loyalist | 71.827433 |
| 5 | 2020 | Promiscuous | 72.023096 |
| 6 | 2021 | First Time Buyer | 61.524167 |
| 7 | 2021 | Loyalist | 71.833070 |
| 8 | 2021 | Promiscuous | 71.676331 |
| 9 | 2022 | First Time Buyer | 61.540000 |
| 10 | 2022 | Loyalist | 71.539244 |
| 11 | 2022 | Promiscuous | 73.187857 |
tmpx = tmp.groupby(['year','loyalty']).agg(avg_spend=('revenue','mean')).reset_index()
tmpx['avg_spend'] = tmpx['avg_spend'].round(2)
tmpx
| year | loyalty | avg_spend | |
|---|---|---|---|
| 0 | 2019 | First Time Buyer | 72.89 |
| 1 | 2019 | Loyalist | 73.97 |
| 2 | 2019 | Promiscuous | 70.02 |
| 3 | 2020 | First Time Buyer | 67.88 |
| 4 | 2020 | Loyalist | 71.83 |
| 5 | 2020 | Promiscuous | 72.02 |
| 6 | 2021 | First Time Buyer | 61.52 |
| 7 | 2021 | Loyalist | 71.83 |
| 8 | 2021 | Promiscuous | 71.68 |
| 9 | 2022 | First Time Buyer | 61.54 |
| 10 | 2022 | Loyalist | 71.54 |
| 11 | 2022 | Promiscuous | 73.19 |
data = []
tmp = df.groupby(['year','loyalty','basket_id']).agg(revenue=('price', sum)).reset_index()
for d in tmp.loyalty.unique():
tmp1 = tmp[tmp.loyalty==d].groupby(['year']).agg(avg_spend=('revenue', 'mean')).reset_index()
tmp1['avg_spend'] = tmp1['avg_spend'].round(0)
data.append(go.Bar(x=tmp1.year, y=tmp1.avg_spend, name = d, text=tmp1.avg_spend, marker_color=lcolor[d]))
fig = go.Figure(
data = data,
layout = go.Layout(
title ='Average Spending Amount per Loyalty',
yaxis=dict(
title='Average Spend'
)
)
)
fig.show()
Over the course of 4 years, from 2019 to 2022, the chart above reveals distinct variations in average spending per transactions among the three loyalty segments: Loyalist, Promiscuous, and First time buyers.
Loyalists exhibited the highest average spending per transaction at $73 in 2019. It went down to $72 in 2020 to 2022. Promiscuous customers spent an average of $70 on 2019 and it went up to $72 for 2020, remained consistent for 2021. It slightly increased to $73 for 2022. First Time Buyers spent an average of $73 at 2019. It went on a decline to $68 for 2020, and went further down to $62 for 2021 and 2022.
The fluctuation average spending pattern of First Time buyers call for concentrated efforts to enhance their conversion into loyal customers. Additionally, understanding the factors behind the consistent and higher spending of Promiscuous customers can inform strategies to promote loyalty and encourage increased spending accross all loyalty types. The consistent spending pattern of Loyal customers also indicates the need for initiatives to maintain their engagement and spending levels.
df.groupby(['loyalty', 'household_type']).agg(cnt_cust=('customer_id',pd.Series.nunique)).reset_index()
| loyalty | household_type | cnt_cust | |
|---|---|---|---|
| 0 | First Time Buyer | 1 adult with kids | 22 |
| 1 | First Time Buyer | 2 adults with kids | 6 |
| 2 | First Time Buyer | 2 adults with no kids | 11 |
| 3 | First Time Buyer | Single female | 4 |
| 4 | First Time Buyer | Single male | 5 |
| 5 | Loyalist | 1 adult with kids | 587 |
| 6 | Loyalist | 2 adults with kids | 159 |
| 7 | Loyalist | 2 adults with no kids | 282 |
| 8 | Loyalist | Single female | 110 |
| 9 | Loyalist | Single male | 203 |
| 10 | Promiscuous | 1 adult with kids | 198 |
| 11 | Promiscuous | 2 adults with kids | 811 |
| 12 | Promiscuous | 2 adults with no kids | 415 |
| 13 | Promiscuous | Single female | 165 |
| 14 | Promiscuous | Single male | 290 |
val = []
hh_color = ['#b45150', '#d4b897', '#273d73', '#667497','#b2c4c9' ]
for d in df.loyalty.unique():
tmp = df[df.loyalty==d].groupby(['loyalty', 'household_type']). \
agg(cnt_cust=('customer_id',pd.Series.nunique)).reset_index()
val.append(tmp.cnt_cust)
household = [d for d in tmp.household_type.unique()]
loyalty = [d for d in df.loyalty.unique()]
from plotly.subplots import make_subplots
#subplots
specs = [[{'type':'domain'}, {'type':'domain'}], [{'type':'domain'}, {'type':'domain'}]]
fig = make_subplots(rows=2, cols=2, specs=specs, subplot_titles=loyalty)
#define pie charts
fig.add_trace(go.Pie(labels=household, values=val[0], name=loyalty[0]), 1, 1)
fig.add_trace(go.Pie(labels=household, values=val[1], name=loyalty[1]), 1, 2)
fig.add_trace(go.Pie(labels=household, values=val[2], name=loyalty[2]), 2, 1)
# Tune layout and hover info
fig.update_traces(hoverinfo='label+percent+name+value', marker=dict(colors=hh_color))
fig.update(layout_title_text='Household Types Distribution by Loyalty')
#layout_showlegend=False)
fig.update_layout(height=800)
fig = go.Figure(fig)
fig.show()
The pie chart illuastrates the distribution of households accross different loyalty types within SuperFoodsMax' customer base.
df.groupby(['loyalty', 'age_band']).agg(cnt_cust=('customer_id',pd.Series.nunique)).reset_index()
| loyalty | age_band | cnt_cust | |
|---|---|---|---|
| 0 | First Time Buyer | 19-24 | 26 |
| 1 | First Time Buyer | 25-34 | 12 |
| 2 | First Time Buyer | 35-44 | 6 |
| 3 | First Time Buyer | 45-54 | 3 |
| 4 | First Time Buyer | 55-64 | 1 |
| 5 | Loyalist | 19-24 | 789 |
| 6 | Loyalist | 25-34 | 305 |
| 7 | Loyalist | 35-44 | 190 |
| 8 | Loyalist | 45-54 | 52 |
| 9 | Loyalist | 55-64 | 2 |
| 10 | Loyalist | Greater than 65 | 3 |
| 11 | Promiscuous | 19-24 | 818 |
| 12 | Promiscuous | 25-34 | 240 |
| 13 | Promiscuous | 35-44 | 14 |
| 14 | Promiscuous | 45-54 | 797 |
| 15 | Promiscuous | 55-64 | 5 |
| 16 | Promiscuous | Greater than 65 | 5 |
#make subplots
val = []
ab_color = ['#6caccc','#7dcff4','#224d62','#8c747c','#d2b9c3']
for d in df.loyalty.unique():
tmp = df[df.loyalty==d].groupby(['loyalty', 'age_band']). \
agg(cnt_cust=('customer_id',pd.Series.nunique)).reset_index()
val.append(tmp.cnt_cust)
age_band = [d for d in tmp.age_band.unique()]
loyalty = [d for d in df.loyalty.unique()]
from plotly.subplots import make_subplots
#subplots
specs = [[{'type':'domain'}, {'type':'domain'}], [{'type':'domain'}, {'type':'domain'}]]
fig = make_subplots(rows=2, cols=2, specs=specs, subplot_titles=loyalty)
#define pie charts
fig.add_trace(go.Pie(labels=age_band, values=val[0], name=loyalty[0]), 1, 1)
fig.add_trace(go.Pie(labels=age_band, values=val[1], name=loyalty[1]), 1, 2)
fig.add_trace(go.Pie(labels=age_band, values=val[2], name=loyalty[2]), 2, 1)
# Tune layout and hover info
fig.update_traces(hoverinfo='label+percent+name+value',marker=dict(colors=hh_color))
fig.update(layout_title_text='Age Band Distribution by Loyalty')
#layout_showlegend=False)
fig.update_layout(height=800)
fig = go.Figure(fig)
fig.show()
The pie charts depicting the distribution of age bands accross different loyalty types offers valuable insights into the demographic of SuperFoodsMax's customers.